Cross entropy loss has served as the main objective function for classification-based tasks. Widely deployed for learning neural network classifiers, it shows both effectiveness and a probabilistic interpretation. Recently, after the success of self supervised contrastive representation learning methods, supervised contrastive methods have been proposed to learn representations and have shown superior and more robust performance, compared to solely training with cross entropy loss. However, cross entropy loss is still needed to train the final classification layer. In this work, we investigate the possibility of learning both the representation and the classifier using one objective function that combines the robustness of contrastive learning and the probabilistic interpretation of cross entropy loss. First, we revisit a previously proposed contrastive-based objective function that approximates cross entropy loss and present a simple extension to learn the classifier jointly. Second, we propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network. We empirically show that our proposed objective functions show a significant improvement over the standard cross entropy loss with more training stability and robustness in various challenging settings.
translated by 谷歌翻译
新颖的检测方法识别不代表模型训练集的样本,从而标记误导性预测并在部署时间带来更大的灵活性和透明度。但是,该领域的研究仅考虑了离线环境中的新颖性检测。最近,在计算机视觉社区中,应用程序越来越多,应用程序需要更灵活的框架 - 持续学习 - 在该框架中,代表新域,新类或新任务的新数据在不同的时间点可用。在这种情况下,新颖性检测变得越来越重要,有趣且具有挑战性。这项工作确定了这两个问题之间的关键联系,并研究了持续学习环境下的新颖性检测问题。我们制定了持续的新颖性检测问题,并提出了基准,在该基准中,我们比较了不同持续学习设置下的几种新颖性检测方法。我们表明,持续学习会影响新颖性检测算法的行为,而新颖性检测可以确定持续学习者的行为的见解。我们进一步提出了基准并讨论可能的研究方向。我们认为,这两个问题的耦合是将视觉模型付诸实践的有前途的方向。
translated by 谷歌翻译
Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.
translated by 谷歌翻译
A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.
translated by 谷歌翻译
Humans can learn in a continuous manner. Old rarely utilized knowledge can be overwritten by new incoming information while important, frequently used knowledge is prevented from being erased. In artificial learning systems, lifelong learning so far has focused mainly on accumulating knowledge over tasks and overcoming catastrophic forgetting. In this paper, we argue that, given the limited model capacity and the unlimited new information to be learned, knowledge has to be preserved or erased selectively. Inspired by neuroplasticity, we propose a novel approach for lifelong learning, coined Memory Aware Synapses (MAS). It computes the importance of the parameters of a neural network in an unsupervised and online manner. Given a new sample which is fed to the network, MAS accumulates an importance measure for each parameter of the network, based on how sensitive the predicted output function is to a change in this parameter. When learning a new task, changes to important parameters can then be penalized, effectively preventing important knowledge related to previous tasks from being overwritten. Further, we show an interesting connection between a local version of our method and Hebb's rule, which is a model for the learning process in the brain. We test our method on a sequence of object recognition tasks and on the challenging problem of learning an embedding for predicting <subject, predicate, object> triplets. We show state-of-the-art performance and, for the first time, the ability to adapt the importance of the parameters based on unlabeled data towards what the network needs (not) to forget, which may vary depending on test conditions.
translated by 谷歌翻译
In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process, data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far, relates to the decision which expert to deploy at test time. We introduce a set of gating autoencoders that learn a representation for the task at hand, and, at test time, automatically forward the test sample to the relevant expert. This also brings memory efficiency as only one expert network has to be loaded into memory at any given time. Further, the autoencoders inherently capture the relatedness of one task to another, based on which the most relevant prior model to be used for training a new expert, with fine-tuning or learningwithout-forgetting, can be selected. We evaluate our method on image classification and video prediction problems.
translated by 谷歌翻译
残疾人在医疗保健,就业和政府政策等各个领域的各种复杂的决策过程中受到各种复杂的决策。这些环境通常已经不透明他们影响的人并缺乏充分的残疾观点代表,它迅速采用人工智能(AI)技术来用于数据分析以告知决策,从而增加因不当或不公平的算法而造成的伤害风险增加。本文介绍了一个通过残疾镜头进行严格检查AI数据分析技术的框架,并研究了AI技术设计师选择的残疾定义如何影响其对残疾分析对象的影响。我们考虑了三种残疾的概念模型:医学模型,社会模型和关系模型;并展示在每个模型下设计的AI技术如何差异很大,以至于与彼此不相容和矛盾。通过讨论有关医疗保健和政府残疾福利中AI分析的常见用例,我们说明了技术设计过程中的特定考虑因素和决策点,这些因素和决策点影响了这些环境中的电力动态和包容性,并有助于确定其对边缘化或支持的方向。我们提出的框架可以作为对AI技术的深入批判性检查的基础,并开发用于残疾相关的AI分析的设计实践。
translated by 谷歌翻译